Let’s load the Prosper data and take a look at the number row and column.
Row count:
## [1] 113937
Column count:
## [1] 81
There are 113937 listing in the dataset with 81 variables. For the scope of this project, I am going to limit the number of variable. The question is which variables.
Looking at how prosper works[1], I add variables that fits the following criteria:
## 'data.frame': 113937 obs. of 15 variables:
## $ DelinquenciesLast7Years : int 4 0 0 14 0 0 0 0 0 0 ...
## $ PublicRecordsLast10Years: int 0 1 0 0 0 0 0 1 0 0 ...
## $ DebtToIncomeRatio : num 0.17 0.18 0.06 0.15 0.26 0.36 0.27 0.24 0.25 0.25 ...
## $ BankcardUtilization : num 0 0.21 NA 0.04 0.81 0.39 0.72 0.13 0.11 0.11 ...
## $ RevolvingCreditBalance : num 0 3989 NA 1444 6193 ...
## $ DaysWithCreditLine : num 5128 7161 4839 11928 4266 ...
## $ InquiriesLast6Months : int 3 3 0 0 1 0 0 3 1 1 ...
## $ LoanOriginalAmount : int 9425 10000 3001 10000 15000 15000 3000 10000 10000 10000 ...
## $ ListingCategory : Factor w/ 21 levels "Not available",..: 1 3 1 17 3 2 2 3 8 8 ...
## $ EmploymentStatus : Factor w/ 9 levels "","Employed",..: 9 2 4 2 2 2 2 2 2 2 ...
## $ AnnualIncome : num 37000 73500 25000 34500 115000 ...
## $ BorrowerRate : num 0.158 0.092 0.275 0.0974 0.2085 ...
## $ Term : Factor w/ 3 levels "12","36","60": 2 2 2 2 2 3 2 2 2 2 ...
## $ ProsperRating : Factor w/ 7 levels "AA","A","B","C",..: NA 2 NA 2 5 3 6 4 1 1 ...
## $ ListingCreationDate : Factor w/ 113064 levels "2005-11-09 20:44:28.847000000",..: 14184 111894 6429 64760 85967 100310 72556 74019 97834 97834 ...
Let’s take a look at the data summary:
## DelinquenciesLast7Years PublicRecordsLast10Years DebtToIncomeRatio
## Min. : 0.000 Min. : 0.0000 Min. : 0.000
## 1st Qu.: 0.000 1st Qu.: 0.0000 1st Qu.: 0.140
## Median : 0.000 Median : 0.0000 Median : 0.220
## Mean : 4.155 Mean : 0.3126 Mean : 0.276
## 3rd Qu.: 3.000 3rd Qu.: 0.0000 3rd Qu.: 0.320
## Max. :99.000 Max. :38.0000 Max. :10.010
## NA's :990 NA's :697 NA's :8554
## BankcardUtilization RevolvingCreditBalance DaysWithCreditLine
## Min. :0.000 Min. : 0 Min. : 1038
## 1st Qu.:0.310 1st Qu.: 3121 1st Qu.: 5704
## Median :0.600 Median : 8549 Median : 7299
## Mean :0.561 Mean : 17599 Mean : 7648
## 3rd Qu.:0.840 3rd Qu.: 19521 3rd Qu.: 9278
## Max. :5.950 Max. :1435667 Max. :24900
## NA's :7604 NA's :7604 NA's :697
## InquiriesLast6Months LoanOriginalAmount ListingCategory
## Min. : 0.000 Min. : 1000 Debt consolidation:58308
## 1st Qu.: 0.000 1st Qu.: 4000 Not available :16965
## Median : 1.000 Median : 6500 Other :10494
## Mean : 1.435 Mean : 8337 Home improvement : 7433
## 3rd Qu.: 2.000 3rd Qu.:12000 Business : 7189
## Max. :105.000 Max. :35000 Auto : 2572
## NA's :697 (Other) :10976
## EmploymentStatus AnnualIncome BorrowerRate Term
## Employed :67322 Min. : 0 Min. :0.0000 12: 1614
## Full-time :26355 1st Qu.: 38404 1st Qu.:0.1340 36:87778
## Self-employed: 6134 Median : 56000 Median :0.1840 60:24545
## Not available: 5347 Mean : 67296 Mean :0.1928
## Other : 3806 3rd Qu.: 81900 3rd Qu.:0.2500
## : 2255 Max. :21000035 Max. :0.4975
## (Other) : 2718
## ProsperRating ListingCreationDate
## C :18345 2013-10-02 17:20:16.550000000: 6
## B :15581 2013-08-28 20:31:41.107000000: 4
## A :14551 2013-09-08 09:27:44.853000000: 4
## D :14274 2013-12-06 05:43:13.830000000: 4
## E : 9795 2013-12-06 11:44:58.283000000: 4
## (Other):12307 2013-08-21 07:25:22.360000000: 3
## NA's :29084 (Other) :113912
Several sharp line on the amount, no surprise here, people tend to borrow in whole numbers. Let’s take a look at the statistic a bit.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1000 4000 6500 8337 12000 35000
The minimum loan is 1000, with the median of 6500 and mean of 8337.
Let’s see the most common loan amount.
## [1] 4000
Interesting to note that 4000 is the most common amount people borrowed, followed by 10000 and 15000.
The maximum loan requested is 35000. Let’s see how many listing asked for that much.
## [1] 430
Not many, only 430 out all of the observation.
## Not available Debt consolidation
## 16965 58308
## Home improvement Business
## 7433 7189
## Personal loan Student use
## 2395 756
## Auto Other
## 2572 10494
## Baby & Adoption Loans Boat
## 199 85
## Cosmetic Procedures Engagement Ring Financing
## 91 217
## Green Loans Household Expenses
## 59 1996
## Large Purchases Medical/Dental
## 876 1522
## Motorcycle RV
## 304 52
## Taxes Vacation
## 885 768
## Wedding Loans
## 771
Most people borrow to consolidate their debts, in total there are 58308 case or about 51.17%.
## Employed Full-time Not available Not employed
## 2255 67322 26355 5347 835
## Other Part-time Retired Self-employed
## 3806 1088 795 6134
Most borrowers are employed. There are 67322 employed borrowers or about 59%.
At binwidth=1000, we can see sharp line around some amount, which make sense, since user tend to input a whole number. The histogram is skewed to the left.
Another look at larger binwidth.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0 38400 56000 67300 81900 21000000
The median for AnnualIncome is 56000, the mean is 67300. The maximum entry is 21 million (1750000 per month).
Most borrower have no deliquencies in the last 7 years or public records in the last 10 years. If I remove the borrower with 0 deliquencies and 0 public records. I got:
Let’s take a look at the statistics for DelinquenciesLast7Years.
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.000 0.000 0.000 4.155 3.000 99.000 990
##
## 0 1 2 3 4 5 6 7 8 9 10 11
## 76439 3967 2879 3183 2592 1826 1790 1648 1421 1208 1151 1075
## 12 13 14 15 16 17 18 19 20 21 22 23
## 982 873 821 795 731 608 574 540 565 472 421 439
## 24 25 26 27 28 29 30 31 32 33 34 35
## 423 347 330 317 296 287 248 214 225 190 190 201
## 36 37 38 39 40 41 42 43 44 45 46 47
## 147 153 144 148 113 106 128 101 110 81 90 94
## 48 49 50 51 52 53 54 55 56 57 58 59
## 78 74 72 72 55 40 40 39 53 30 31 34
## 60 61 62 63 64 65 66 67 68 69 70 71
## 41 34 36 31 28 34 27 22 20 20 15 13
## 72 73 74 75 76 77 78 79 80 81 82 83
## 14 17 9 22 10 15 10 8 12 4 12 6
## 84 85 86 87 88 89 90 91 92 93 94 95
## 8 3 7 7 9 5 7 4 6 2 3 4
## 96 97 98 99
## 4 4 3 110
While most borrowers has 0 deliquencies, there are still 3967 borrowers who have at least 1 deliquencies in the last 7 years. And there are also 110 borrowers that have 99 (maximum ) Deliquencies in the last 7 years.
And the statistics for PublicRecordsLast10Years.
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.0000 0.0000 0.0000 0.3126 0.0000 38.0000 697
##
## 0 1 2 3 4 5 6 7 8 9 10 11
## 85803 22834 3011 894 345 151 70 46 31 15 8 7
## 12 13 14 15 16 17 20 21 22 25 30 34
## 4 1 4 3 5 1 1 1 1 1 1 1
## 38
## 1
While most borrowers has 0 public records for the last 10 years, there are 22834 borrowers have at least 1 public records in the last 10 years. The maximum public records is 38.
Debt to Income Ratio
A debt income ratio is the percentage of a consumer’s monthly gross income that goes toward paying debts. The data is capped at 10.01, debt-to-income ratio larger then 1000% will be returned as 1001%.
Removing the upper quantile on the data we got:
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.000 0.140 0.220 0.276 0.320 10.010 8554
The maximum is 10.01. This is specified in the data definition, debt to income ratio is always capped to 10.01 (1001%). The minimum value is 0. With the median of 0.22 and mean of 0.276.
Revolving Credit Balance
Revolving Credit Balance is the total outstanding balance that the borrower owes on open credit cards or other revolving credit accounts.
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0 3121 8549 17600 19520 1436000 7604
The median is 8549 and mean 17600. The maximum value is 1436000. The minimum and the most common amount is 0.
Bankcard Utilization
Bankcard utilization is the sum of the balances owed on open bankcards divided by the sum of the card’s credit limits. Lower usually means better.
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.000 0.310 0.600 0.561 0.840 5.950 7604
There are interestingly 2 peaks in the plot, first there are a lot of borrowers who have almost 0% Bankcard Utilization and then another peak near 100%. There are some borrowers who have utilization > 1.00 (100%).
Number of borrowers with BankcardUtilization < 0.05:
## [1] 9361
Number of borrowers near 1:
## [1] 9532
Number of borrowers with BankcardUtilization >= 1:
## [1] 2574
There are 2574 borrowers who has bankcard utilization > 1. That means they owed more then the credit limit.
Length of credit history is the number of days from the date when the oldest account on the borrower’s credit record was opened till today.
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 1038 5704 7299 7648 9278 24900 697
There is a credit line going up to 60 years.
## AA A B C D E HR NA's
## 5372 14551 15581 18345 14274 9795 6935 29084
The most common rating (excluding the NA’s) is rating C, 18345. Only 5372 listing have AA rating or about 4.71%.
## 12 36 60
## 1614 87778 24545
Most loans have 36 months term or about 77%.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0000 0.1340 0.1840 0.1928 0.2500 0.4975
The median for the borrower rate is 18.4% and mean 19.28%. The maximum borrower rate is 0.4975 or 49.75%. There are 6 observation that has more then 40% borrower rate.
What is/are the main feature(s) of interest in your dataset?
The main features of the data are:
I chose this variables, because these variable is visible from the UI[1].
What other features in the dataset do you think will help support your investigation into your feature(s) of interest?
I added ListingCreationDate. I added it just to see if there is “trend” in the behavior.
Did you create any new variables from existing variables in the dataset?
Yes, Days with credit line.
Of the features you investigated, were there any unusual distributions? Did you perform any operations on the data to tidy, adjust, or change the form of the data? If so, why did you do this?
All of the money related variables (LoanOriginalAmoun, RevolvingCreditBalance and AnnualIncome) are positively skewed. I do not transform the data for univariate analysis.
Let’s see the relationship between LoanOriginalAmount with ListingCategory.
The baby and adoption loans looks similar to debt consolodation. Let’s take a look at the statistics.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1000 4000 9500 9908 15000 35000
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2000 4000 9000 9751 15000 30000
The top one is debt consolidation summary and the bottom one is baby and adoption loans. Very close but not similar.
Interesting to note that wedding loans is quite high. Let’s take a look at the numbers.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2000 4000 7500 8836 13000 35000
The median and mean are 7500 and 8836, with maximum value up to 35000.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1000 2500 4000 4873 6000 25000
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1000 1600 3000 4089 5000 25000
Interesting to see that not employed is requesting loan higher then part time. The median and the mean of not employed borrower are 4000 and 4873 vs 3000 and 4089 from part time borrower. Both max loans are 25000.
Now let’s look at the relationship between LoanOriginalAmount with AnnualIncome.
Most of the Loan are below 10000 and annual income is under 100000. The quantile shows that the higher the annual income the higher the median of the loan original amount.
The number of the data that have original amount < 10000 and annual income less then 100000 is:
## [1] 65553
which is around 57.53% of the data.
It seems that people who borrow > 25000 has annual income of >= 100000 looks like there some kind of rule, that if you borrow > 25000 the the minimal annual income is 100000.
Let’s verify this a bit.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 100000 115000 137200 160200 175000 800000
And yes the minimum annual income is 100000. Let’s also check the correlation between the 2 variables.
## [1] 0.2012595
This indicate weak positive relationship.
Now let’s check the DebtToIncomeRatio with BorrowerRate.
That is not too informative. Let’s add some quantile information.
The median of the borrower rate gets higher as the debt to income ratio gets higher as well.
Another way to look at this is by breaking the DebtToIncomeRatio into several bins.
A quick look at the newly created variable.
Let’s have another look at the relationship between DebtToIncomeRatio with BorrowerRate.
We can see that the BorrowerRate median increases the higher the DebtToIncomeRatio. Let’s check the correlation between these 2 variables.
## [1] 0.06291678
The correlation is not all that significant, it show no or neglible relationship. This is contrary to my initial assumption. I assume the correlation will be positive.
Seperating the debt to income ratio provide an interesting look into the data, let’s separate the borrower rate into bin as well.
Quick look at the result.
Let’s check the relationship with annual income.
These two plots are the same, the first one using the newly created BorrowerRate bins. Generally these plots shows the more you annual income the less is your borrower rate.
Let’s take a look at the correlation.
## [1] -0.0889818
Again while the plot show some trend, the correlation between the two variables is negligible.
Let’s see the relationship of the borrower rate with other variables.
As BankcardUtilization, DeliquenciesLast7Years and PublicRecordsLast10Years increases so is the borrower rate. On the other hand the lower the RevolvingCreditBalance the lower the BorrowerRate.
Let’s check the correlation.
Correlation between BorrowerRate with BankcardUtilization:
## [1] 0.255482
Correlation between BorrowerRate with RevolvingCreditBalance:
## [1] -0.05960823
Correlation between BorrowerRate with DelinquenciesLast7Years:
## [1] 0.1702787
Correlation between BorrowerRate with PublicRecordsLast10Years:
## [1] 0.1283138
Correlation between BorrowerRate with DaysWithCreditLine:
## [1] -0.0474466
Correlation between BorrowerRate with InquiriesLast6Months:
## [1] 0.18381
BankcardUtilization has a weak positive relationship. The other factor, while showing position relationship for DelinquenciesLast7Years, PublicRecordsLast10Years and negative relationship for RevolvingCreditBalance, has a negligible relationship.
Let’s take a look at the relation of Term with other variables now.
The median for the LoanOriginalAmount increases as the terms get longer. Let’s verify this.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1000 2000 3500 4694 5000 25000
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1000 3000 5000 7276 10000 35000
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2000 8000 11500 12370 15000 35000
And entry with term 60 has median of 11500, 5000 for 36 and 3500 for 12.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0 44200 67000 82660 97930 7423000
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0 36000 54000 65290 80000 21000000
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0 45000 64000 73470 90000 1305000
The median for 12 months term is 67000 higher then the 36 months term which is 54000.
The relationship with DebtToIncomeRatio
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.0100 0.1100 0.1700 0.2202 0.2800 10.0100 199
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.000 0.140 0.210 0.282 0.310 10.010 6953
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.0100 0.1700 0.2300 0.2565 0.3200 10.0100 1402
The median of the DebtToIncomeRatio increases as the terms goes up. For 60 months term the median is 0.23, for 36 months term the median is 0.21.
Another look, this time using DebtToIncomeRatio.bin.
The term distribution actually is pretty even across the DebtToIncomeRatio.bin. Let’s check the correlation.
## [1] -0.01467005
We see that correlation number is neglible.
Let’s look at borrower rate vs term as well.
This is quite interesting borrower rate above 30% does not have 12 months Term.
Okay we see the same result from the boxplot. Let’s look at the number.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0400 0.0929 0.1434 0.1501 0.2064 0.2669
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0000 0.1274 0.1815 0.1935 0.2599 0.4975
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0669 0.1490 0.1870 0.1930 0.2319 0.3304
The median BorrowerRate for 60 months term is 0.1870, the highest among the 3.
Let’s check ProsperRating relationship with DebtToIncomeRatio and BorrowerRate now.
The better rating the lower the borrower rate. Let’s verify this.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.04000 0.06990 0.07790 0.07912 0.08450 0.21000
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0498 0.0990 0.1119 0.1129 0.1239 0.2150
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0693 0.1414 0.1509 0.1545 0.1639 0.3500
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0895 0.1765 0.1914 0.1944 0.2099 0.3500
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.1157 0.2287 0.2492 0.2464 0.2625 0.3500
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.1479 0.2712 0.2925 0.2933 0.3149 0.3600
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.1779 0.3134 0.3177 0.3173 0.3177 0.3600
The borrower rate median get increases as we move to worse rating.
Okay let’s now check out the Debt to income ratio.
It seems at rating AA, the maximum DebToIncomeRatio is 50%. Let’s inspect the data for this.
## 0-10% 10-20% 20-30% 30-40% 40-50% 50-60% 60-70%
## 1224 2339 1227 300 38 11 1
## 70-80% 80-90% 90-100% 100-1000% NA's
## 2 1 3 4 222
Our initial thought is not true. Even for AA rating we still have debt to income ratio > 50%. So there are listing where the prosper rating is good but the debt to income ratio is more then 50%.
Talk about some of the relationships you observed in this part of the investigation. How did the feature(s) of interest vary with other features in the dataset?
I wanted to see how several features affect the borrower rate, term and prosper rating. I put several data into “bins” as this makes it a bit easier to work with. By using this on borrower rate, debt to income ratio and delinquicies observations, we can paint a clearer picture on the relationship between features.
We can see for instance the borrower rate increases as debt to income ratio increases. The term seems to be related with loan original amount, the bigger the amount the longer the term. The borrower rate also shows a slight increase as the delinquincies in last 7 years goes up.
Did you observe any interesting relationships between the other features (not the main feature(s) of interest)?
No.
What was the strongest relationship you found?
If we plot the ProsperRating against other features, the plot became much clearer. For instance we can quickly see that the debt to income ratio for rating AA will be lower then other rating.
The borrower rate shows an even clearer picture. The better your rating the lower your borrower rate. At rating C, the number of listing that have borrower rate between 0 to 10% is only 8.
Term is not affected by the rating, the most common terms is 36 across the ratings.
The debt to income ratio distribution in each rating is also interesting to see. While there is a pattern that shows that the higher the debt to income ratio tend to be more significant in lower rating (HR), there seems to be exception. For instance people with huge debt to income ratio can also have a prosper rating AA, not much 4 out of 5150 almost zero percent.
Let’s compare the debt to income ratio again, this time faceted by rating.
From the bivariate analysis, the correlation between DebtToIncomeRatio is not actually that strong. This plot shows that. Most of the DebtToIncomeRatio is clustered under 1. But really the strong relationship is between the rating and borrower rate. In fact, on rating B, D and E, the median for borrower rate decrease as the debt to income ratio raises.
Let’s take a look at the distribution instead.
If we look at the plot above, we see that on rating AA, there is no borrower rate bin larger then 20%. So it does shows the distribution of borrower rate pretty good (better rating, better borrower rate). But if we look at the debt to income ratio bins, we see that they are all over the place. It is true that rating HR you got more borrower with high debt to income ratio, but we also have low debt to income ratio as well.
Next, let’s see if we can add another dimension by using ListingCreationDate.
There is some different from year to year feature distribution within rating. For instance the the borrower rate distribution we can see that the borrower rate for rating AA in 2013 and 2014 almost all between 0-10%. For rating B we seems to have borrower rate of 20-30% in 2011 and 2012, but 10-20% in 2013 and 2014.
Let’s do the same with debt to income ratio.
The distribution shows interesting patterns. The debt to income ratio for each rating, seem to have allow higher debt to income ratio from year to year.
This plot shows the effect of several factor on ProsperRating. The data for each factor is scaled, grouped by prosper rating and averaged. I use line plot to show how each factor trend for each ProsperRating. Since it is scaled, the point also show how far it is from the mean (the 0 line).
Several things we can see:
This plot shows the borrower rate distribution for borrower based on ProsperRating. If your rating is AA you will likely get 0-10% borrower rate, if your rating is A, B, or C, you will likely get 10-20%, for D and E it is between 20-30% and 30-40% for HR. Note while most of rating HR has high borrowing rate (30-40%), the are still a few who have 10-20% borrower rate.
This plot is another look at plot 2 with added dimension of listing creation date. The plot shows the trend of borrower rate from 2009 and 2014 faceted by ProsperRating. We can see that if a borrower is rated AA in 2009 they can get 10-20% borrower rate. In 2013 and 2014, if you are rated AA you will get 0-10% borrower rate. If you are rated E in 2009 most borrower will get 30-40% rate, but in 2014 you can actually get 20-30% borrower rate. The number of borrower with 20-30% borrower rate also increases significantly for rating E in 2012 onwards. For HR in 2009 and 2010, quite a number of borrower can still have a 10-20% borrower rate. but 2011 onward this no longer possible.
Overall, it seems in 2009, the borrower rate is actually quite high. Even for rating AA a borrower can still get 20-30% borrower rate. If you are rating is E in 2009 you will likely to get 30-40% borrower rate. This no longer happen in 2014, In 2014, AA will most likey give you 0-10% borrower rate. A, B, and C rating will get 10-20%, D and E most likely 20-30%. In 2014 for HR rating you can still get 20-30% borrower rating, in contrast in 2011-2012 this will not be possible.
The Prosper data has a lot of variables, for this scope of the project I limited the number of variables to investigate. The first part is to select which variables to investigate. After much thought, I use the variable that a borrower can actually see in the loan listing page[1]. I do this because I assume these are the metric that is important for lender to look at before actually lending money, so it is a good start.
Initially I wanted to show the relationship between the variables with borrower rate, for instance debt to income ratio vs borrower rate, bankcard utilization vs borrower rate. To ease the exploration I have put several variables into “bins”. Putting it into bins makes it easier for me to show the relationships between variables.
It is also much easier to show relationship based on ProsperRating then borrower rate. For instance if we faceted debt to income ratio with ProsperRating, it is easier to see that the lower your debt to income ratio the better is your rating. And then show the better you rating the better is you borrower rate.
Even on this limited number of variables, there is a lot of thing that we can investigate further. One thing we can try to take a look as how to a borrower gets the ProsperRating. What are the make up of ProsperRating. I layout several factor but, those factors are based on Prosper UI[1]. In the data there are also information on borrower credit score. So it is interesting to see what kind of relationship exists between credit score and ProsperRating.